EDA News Monday May 3, 2004 From: EDACafe ÿÿ Previous Issues _____ http://www.mentor.com/fpga/ _____ About This Issue 1st International System-on-Chip Conference _____ April 26 - 30, 2004 By Dr. Jack Horgan Read business product alliance news and analysis of weekly happenings _____ Last week I attended the 1st International System-on-Chip Conference. Since it was the inaugural conference, I can not comment on how it compared to last year's event. There were about 60 to 70 people in attendance. The two day event had a single threaded format. There were five main sessions: ASIC/SoC/Foundry for 90nm and Sub-90nm SoC Design Challenges Configurable CPUS and DSPs for SoC Platform Design SoC Design Using Programmable ICs and Structured ASIC System-on-Chip Platform Design With one exception each session had four speakers. There was considerable depth and breath on the topic of SoC design. At the end of each day there was a panel discussion composed mostly of session speakers. There were also two keynote speeches for good measure. A small exhibition was held at the end of the first day. All in all an information packed event. In the presentations there was understandably considerable repetition of familiar material on the challenges of and the state of SoC design -Moore's Law, Sea of Gates, Crisis in Complexity, Designer Productivity Gap :. However there was significant variation on how the various vendors approached the problems. Space does not permit a summary of all these approaches. Instead highlights from a few selected presentations will be covered. An ASIC implementation is characterized in relative terms by low unit cost, high performance and low power consumption. The development of an ASIC involves significant amounts of NRE, time and risk. An ASIC also lacks post silicon flexibility. An FPGA implementation has a nearly the opposite profile. Several years ago the dominant uses of SoCs were in products that were expensive and had long lifetimes and slow volume ramps. ASICs were a good fit for these applications. Today the major use of SoCs is in consumer products like cell-phones and DVD players that are inexpensive, are feature rich, ship in large volumes with a steep ramp, have limited available power and have lifetimes of only a few months. Neither an ASIC nor FPGA implementation is a perfect fit. Missing the market window can have disastrous financial impact on a company. Meeting the market window but with a product judged to be too expensive, lacking features, requiring frequent recharging and so forth can have a similar impact. Even a successful product will require a rapid successor. The presenters at the conference described tools and methodologies targeted at improving one or more of the following metrics: unit cost, power consumption, performance, and flexibility. Tensilica, Inc - configurable, extensible processor Tensilica, Inc was founded in 1997. The firm has raised $64 million in four rounds of funding, $31 million in latest round in April 2001. Major investors include Altera Corporation, Cisco Systems, Conexant Systems and a number of venture firms. Tensilica observed that much of a SoC consists of hard coded RTL blocks to implement application specific functionality or to address performance issues of the on-chip processor. This RTL logic consists of a state machine (10% of the gates and 90% of the risk) and computational elements (90% of the gates and 10% of the risk). The company's approach replaces these RTL blocks with an implementation in firmware plus designer-defined execution units and registers added to a pre-defined processor. Tensilica's Xtensa processor is a configurable microprocessor architecture designed specifically to address embedded SOC applications. An Xtensa processor is a configurable, extensible and synthesizable processor core. The base architecture has 80 RISC instructions and includes a 32-bit ALU, 32 or 64 general-purpose 32-bit registers employing a register-windowing scheme that accelerates function calls and 6 special-purpose registers. The patented instruction set architecture features a compact 16- and 24-bit instruction set optimized for embedded designs. The system designer, hardware or software developer uses the Web-based Xtensa Processor Generator interface to select the instruction set options, memory hierarchy, closely-coupled building blocks and external interfaces required by the application. The designer can also describe additional data-types, instructions and execution units using the Verilog-like Tensilica Instruction Extension (TIE) language. The TIE language can also describe new registers, register files, and custom data types such as 24-bit data for audio applications, 56-bit data for security processing, 256-bit data types for packet processing. The Xtensa Processor Generator then produces both the complete synthesizable hardware design and the tailored software environment in a matter of hours. The synthesizable hardware can be immediately integrated into the remaining SOC design. It is easily ported to any fabrication process. Software development, system-level simulation and tuning can also start immediately by using the profiler, various simulation models and overlays for supported RTOSes. The Xtensa Xplorer IDE is based in part on the open-source ECLIPSE platform for tool integration. Support is also provided for COTS operating systems and IDEs. By utilizing the execution profiler, the designer is able to analyze the efficiency of an application program and evaluate where TIE can be used to accelerate the performance of the software. The designer can explore multiple architectures by making design tradeoffs based on real-time feedback from the processor generator. The designer can weigh the benefit of adding instructions and TIE hardware before committing to silicon. The Xtensa software development environment consists of industry standard GNU tools. These include a C/C++ compiler (gcc), assembler, linker, and a debugger (gdb). This environment is generated from the same database as the processor hardware description to assure correctness and consistency by construction. The software tool chain is automatically updated and optimized to make use of the designer-defined instructions added during the hardware-generation process. The company presented a few practical examples including a GSM audio codec used in a cell phone. Profiling the code running on an RSIC processor showed 80% of the time was spent executing multiplications. Adding a multiplier as a configuration option reduced the number of cycles to execute the code by a factor of 7. Using a multiplier/accumulator instead reduced the number of cycles by a factor of ~12. Tensilica cited impressive benchmark results from EEMBC (Embedded Microprocessor Benchmark Consortium). This is a non-profit consortium, funded by over 40 member companies to provide independently certified benchmark scores relevant to deeply embedded processor applications. There are five different benchmark suites. EEMBC supports both "out-of-the-box" and optimized performance scores. Optimized allows the use of C coding changes and assembly coding to accelerate performance. Because extended processors employ firmware instead of RTL-defined hardware for their control algorithms, it's easier and faster to develop and verify processor-based task engines for many embedded SoC tasks than to develop and verify RTL-based hardware to perform the same tasks. On average a Tensilica customer uses five Xtensa processors per SoC design, each tuned for a different purpose Elixent - configurable processing array Elixent was founded in October 2000 in Bristol, England, as a spin-off from Hewlett-Packard's Research Laboratories. The four founders worked together at HP for four years developing the concept of a Reconfigurable Algorithm Processing (RAP) platform. The company's initial $14 million funding came from venture capitalist firm 3i Group and from industrial investors HP and Actel. Venture capital firms invested an additional $10 million in a second round in July 2003. The first public demonstration of its RAP technology took place in October 2002 at the CEATEC show in Japan. The company took an image and encoded it via JPEG using a single piece of silicon running the different algorithms needed for this task. This demonstration showed the performance and die area improvements possible with reconfigurable technology. In January 2003 Elixent entered into an agreement with Toshiba to jointly develop a platform SoC that integrates Elixent's D-Fabrix reconfigurable algorithm processing array with Toshiba's MeP configurable processor core. This SoC will be used as a reconfigurable evaluation and development platform by both companies. The basis for company's technology is the D-Fabrix processing array. The components of the array are 4-bit ALUs, registers, and the 'switchbox', which are incorporated into a 'tile'. Hundreds or thousands of tiles are then combined to create the fine-grained D-Fabrix array. Special functions are then distributed through the array and algorithms can be implemented in the hardware. The result is a solution that combines the performance, power and area overhead benefits of hardware with the flexibility of a software configuration. RAP is a powerful approach to implementing algorithms that need high arithmetic throughput and low cost. The ALUs are positioned on the array chessboard-style, alternating with adjacent 'switchboxes' which can serve as either a crosspoint switch or 64 bits of configuration memory. Further, 256 byte memory blocks can be inserted as required. This scheme facilitates extremely flexible interconnectivity, with each ALU having input and output buses on all four sides, and able to send data to or receive data from any of eight surrounding ALUs. It also greatly simplifies interconnect and routing, minimizing the silicon overhead necessary for programmability. With this platform in place, algorithms are mapped onto the array. This is done by drawing the signal flow across the array - describing it in an HDL such as Verilog, or a higher-level language like Handel-C - or Matlab. If an 8-bit added is needed, use two ALUs. If a 32-bit adder adder is required, use 8 ALUs. If an Add/Compare/Select unit is needed, just use a few ALUs. Once there are maths units, the switchboxes link them together. They are part of a rich interconnect, providing both local and global connectivity. Elixent's D-Fabrix RAP platform implements algorithms in "Virtual Hardware", allowing the creation of a hardware accelerator for every algorithm in a system. By virtue of reconfigurability, it can implement multiple hardware accelerators in the same silicon area, giving high silicon utilization. Further, this reconfigurability allows functionality to be added or changed post-fabrication, allowing bugs to be fixed, new functions to be added, or even the whole chip to be customized. Elixent's patented IP is designed to be embedded in complex system chips used in applications such as digital cameras and printers. For example, the technology could provide a digital camera with performance improvements, such as reduced shot-to-shot and shutter delay. The reconfigurable nature of the company's technology would enable the same chip to deliver a wide range of marketable features such as improved image quality, improved compression formats, and innovative special effects. Elixent's standard way of delivering RAP is DFA-1000, silicon IP, allowing customers to integrate the D-Fabrix array into their own chips. ARC International - user configurable cores ARC Cores was originally a business unit of Argonaut, a games developer. In 1998 ARC invented a configurable microprocessor core that was licensed by Nintendo. In 1999 the firm introduced a complete IDE and soon followed with an RTOS and other software and middleware. For 2003 the company had revenues of ?10.7 million. The ARCtangent microprocessor is a 32-bit user-customizable core for ASIC, SoC, ASSP and FPGA development. Since the synthesizable core is delivered in HDL, the ARCtangent processor is portable to almost any manufacturing process, synthesis library and foundry. At its heart is a 32-bit RISC architecture with a four-stage instruction pipeline and mixed 16/32-bit, code density optimized, instruction set. Most instructions operate in a single cycle and have optional conditional execution. The compact ISA reduces code size, improves code efficiency and provides a large instruction expansion space. For optimal code size reduction, equivalent 16-bit instructions have been implemented for the most frequently used 32-bit operations. Developers can modify and extend the instruction set for specific applications to optimize performance, I/O throughput, power consumption, silicon area and cost. Designers can add DSP functionality and merge RISC and DSP functions onto a single processor, thereby saving even more silicon area and power consumption. Due to multiple CPU I/O interfaces and a low gate count, the ARCtangent processor lends itself to multiprocessor designs. The processor is supported by development tools including a configuration tool, which has a graphical "point and click" user interface. The tool has a range of options to build HDL, synthesis scripts, test bench and HTML documents. The MetaDeveloper tool suite includes a C/C++ compiler, assembler, linker, profiler and debugger. This tool chain fully supports the capabilities of the ARCtangent processor including multiprocessor debugging and extensibility for processor customization. ARC also provides a royalty free real time operating system. ARC offers the ARCangle, a FPGA based development board that supports the configurability and extensibility of the ARCtangent processor. There is plenty of capacity for adding custom interfaces, coprocessor modules, application specific interfaces or even more processors. The ARCtangent processor configuration tool can target HDL builds at this device so that developers can generate and test their processor configurations at MHz speeds. Dr. Nader Bagherzadeh, Professor of EE and Computer Science at UC Irvine gave a presentation on reconfigurable digital signals processors. During the presentation he made reference to several vendors including IPFlex and picoChip. picoChip Designs Limited - reconfigurable signal processors picoChip is a fabless semiconductor company based in Bath, UK, targeting the 3G basestation market. The firm was founded in September 2000 and raised $7M in a first round investment in June 2001 and an additional $17M in October 2003. A typical basestation design requires radio frequency and power amplifier expertise, as well as a high-speed baseband, executing specific DSP operations and complex control protocols. Basestation solutions have traditionally used custom ASIC/FPGA technology together with DSP and embedded processor devices. The integration of these differing technologies is a major challenge. picoChip offers a scaleable, multi-processor baseband IC that combines the computational density of a dedicated ASIC with the programmability of a traditional high end Digital Signal Processor along with a rich programming environment and comprehensive system libraries. The picoArray itself is a massively parallel array of individual processors linked by a deterministic high speed interconnect fabric of 32-bit buses, with about 400 cores on a single die, well described as "Software System On Chip" (SSOC). Each of these is a capable 16 bit device with local data and program memory, roughly equivalent to an ARM9 for control tasks or a TI C5x for DSP roles. Because each of these cores can operate in parallel or in concert, and because of the huge bandwidth of the on-chip buses, the picoArray can deliver a huge amount of processing power (>100GigaOperations-per-second). Multiple array elements can be programmed together as a group to perform particular functions ranging from fast processing such as filters and correlators, through to the most complex control tasks. Each element is allocated a series of simple tasks to avoid problems of statistical multiplexing of resources or run-time scheduling and so performance is entirely deterministic, simplifying development and verification. The architecture is heterogeneous with four types of RISC processors sharing a common instruction set, but having varying amounts of memory and additional instructions to implement certain wireless baseband control and digital signal processing functions. Complementing the device is a complete development tool-chain and a comprehensive systems library, providing a complete baseband platform for infrastructure. Each processor can be programmed in either C or efficient assembly code, while VHDL is used to describe the inter-processor relationships. There is no actual VHDL programming and no need for VHDL simulation; only the structural elements are used to define the relationship between elements. This approach allows the algorithms to be efficiently partitioned and mapped onto specific processing elements at a relatively high level. It also allows the use of new or existing C code to add functions, optimizing code re-use and exploiting existing programming skills for rapid prototyping. The intended role of the picoChip platform is within wireless infrastructure, where it supports reconfiguration on an "occasional" basis, perhaps every few hours or days. Examples would include upgrading to a new release of a standard, incorporating an improvement to an algorithm, or switching between peak and off-peak operational modes. In such an update, the picoArray is reset, all the elements are programmed, and the interconnection fabric is completely redefined. IPFlex, Inc. - dynamically reconfigurable processors IPFlex, a Japanese corporation, was founded in March 2000 as a fabless semiconductor company focused on developing dynamically reconfigurable processors and its integrated development software. The company has raised $12 million in funding. In December 2002, the IPFlex and Fujitsu formed an equity collaboration. In March 2004 the two firms announced the commercial release of their jointly developed processor, the DAP/DNA-2 (Digital Application Processor/Distributed Network Architecture). They expect to begin shipping sample quantities in Japan in mid-May. DAP/DNA dynamically reconfigurable processor is designed as a dual-core processor comprised of a high-performance RISC core (DAP) and a dynamic reconfigurable processor core (DNA), and it is a platform that provides hardware performance while maintaining software flexibility. The DAP/DNA dynamic reconfigurable processor series is provided with the DAP/DNA-FW II as the integrated software development environment. It provides compilers for algorithms written in MATLAB/Simulink and C with data flow extension, thus realizing high-abstraction level algorithm design as well as leveraging existing intellectual properties of users. The DAP/DNA-2 is a microprocessor that contains multiple processing elements (PEs) and can optimally configure internal circuits to best suit the application in demand. The function of each PE, as well as connections among PEs, can be reconfigured not only when building the system, but also when it is running, enabling instant (within one clock cycle) reconfiguration to suit the application at hand. The DAP/DNA-2 lays out these PEs in a two-dimensional array so that it can quickly and flexibly change their function and the connections between them. Using dynamic reconfiguration technology makes it possible to process multiple functions with a single DAP/DNA-2 that previously required several specialized chips. Also a single algorithm could be partitioned in time for execution. DAP/DNA's integrated development environment enables algorithm development in high-level languages. This capability increases design productivity, shortens the development time and slashes cost. Afterword During one of the panel discussion the question arose concerning the degree to which programmability and configurability could "future proof" a product. It was readily conceded that one could exploit these capabilities to support multiple members of a product family, fix bugs, add features, and adjust to changing protocols. However, the approach has its limits. The consensus was that the Product Marketing function could not be eliminated. Someone has the responsibility to look ahead in time and across product lines to foresee future feature set, performance, power and cost requirements. There is a tradeoff of overhead today for headroom tomorrow. Weekly Industry News Highlights Novas Expands Deployment Of Debug System Within Ricoh's Electronic Devices Company Solution for Wire Harness Design Cuts Cabling Weight, Size and Cost While Optimizing Performance Mentor Graphics and X-FAB Provide New Production-Proven Design Kits for Mentor's Mixed-Signal IC Design Flow Magma Signs Multi-Year Worldwide Licensing Agreement with NEC Electronics Esterel Technologies and ENSCO Inc. Team on Products and Services for Safety-Critical and Mission-Critical Applications Toshiba Tapes Out Multiple 90-Nanometer SoC Designs With Synopsys' Galaxy Design Platform Magma Announces Support for Virage Logic Structured ASIC Design Libraries Freescale Semiconductor Reveals PowerPC(R) Core Roadmap and Scalable System-on-Chip Platforms Atmel Introduces a PC7447A Microprocessor for Extended-Reliability Applications VCX Software To Provide IP Data, Supply Chain Capabilities to Chip Estimation and Optimization Expert Giga Scale IC Giga Scale IC Rouses Electronics Industry with InCyte; Industry's First Specification, Optimization Tools Model Nanometer Physical Effects More EDA in the News and More IP & SoC News Upcoming Events... --Contributing Editors can be reached by clicking here . You are registered as: [dolinsky@gsu.by]. CafeNews is a service for EDA professionals. EDACafe respects your online time and Internet privacy. To change your newsletter's details, including format and frequency, or to discontinue this service, please navigate to . If you have questions about EDACafe services, please send email to edaadmin@ibsystems.com . Copyright c 2004, Internet Business Systems, Inc. - 11208 Shelter Cove, Smithfield, VA 23420 - 888-44-WEB-44 - All rights reserved.